Reading � Wordnet, 5 papers

Greg Detre

Thursday, 04 April, 2002

 

 

 

Two kinds of building blocks are distinguished in the source files: word forms and word meanings. Word forms are represented in their familiar orthography; word meanings are represented by synonym sets � lists of synonymous word forms that are interchangeable in some syntax. Two kinds of relations are recognized: lexical and semantic. Lexical relations hold between word forms; semantic relations hold between word meanings.

The most obvious difference between WordNet and a standard dictionary is that WordNet divides the lexicon into five categories: nouns, verbs, adjectives, adverbs, and function words. Actually, WordNet contains only nouns, verbs, adjectives, and adverbs.1 The relatively small set of English function words is omitted on the assumption (supported by observations of the speech of aphasic patients: Garrett, 1982) that they are probably stored separately as part of the syntactic component of language. The realization that syntactic categories differ in subjective organization emerged first from studies of word associations. Fillenbaum and Jones (1965), for example, asked English- speaking subjects to give the first word they thought of in response to highly familiar words drawn from different syntactic categories. The modal response category was the same as the category of the probe word: noun probes elicited nouns responses 79% of the time, adjectives elicited adjectives 65% of the time, and verbs elicited verbs 43% of the time. Since grammatical speech requires a speaker to know (at least implicitly) the syntactic privileges of different words, it is not surprising that such information would be readily available. How it is learned, however, is more of a puzzle: it is rare in connected discourse for adjacent words to be from the same syntactic category, so Fillenbaum and Jones�s data cannot be explained as association by continguity.

Fortunately, an alternative indicator of familiarity is available. It has been known at least since Zipf (1945) that frequency of occurrence and polysemy are correlated. That is to say, on the average, the more frequently a word is used the more different meanings it will have in a dictionary. An intriguing finding in psycholinguistics (Jastrezembski, 1981) is that polysemy seems to predict lexical access times as well as frequency does. Indeed, if the effect of frequency is controlled by choosing words of equivalent frequencies, polysemy is still a significant predictor of lexical decision times.